28 research outputs found

    Adaptive Multicore Scheduling for the LTE Uplink

    Get PDF
    International audienceThe next generation cellular system of 3GPP is named Long Term Evolution (LTE). Each millisecond, a LTE base station receives information from up to one hundred users. Multicore heterogeneous embedded systems with Digital Signal Processors (DSP) and coprocessors are power efficient solutions to decode the LTE uplink signals in base stations. The LTE uplink is a highly variable algorithm. Its multicore scheduling must be adapted every millisecond to the number of connected users and to the data rate they require. To solve the issue of the dynamic deployment while maintaining low latency, one approach would be to find efficient on-the-fly solutions using techniques such as graph generation and scheduling. This approach is opposed to a static scheduling of predefined cases. We show that the static approach is not suitable for the LTE uplink and that present DSP cores are powerful enough to recompute an efficient adaptive schedule for the LTE uplink most complex cases in real-time

    Optimization of automatically generated multi-core code for the LTE RACH-PD algorithm

    Get PDF
    Embedded real-time applications in communication systems require high processing power. Manual scheduling devel-oped for single-processor applications is not suited to multi-core architectures. The Algorithm Architecture Matching (AAM) methodology optimizes static application implementation on multi-core architectures. The Random Access Channel Preamble Detection (RACH-PD) is an algorithm for non-synchronized access of Long Term Evolu-tion (LTE) wireless networks. LTE aims to improve the spectral efficiency of the next generation cellular system. This paper de-scribes a complete methodology for implementing the RACH-PD. AAM prototyping is applied to the RACH-PD which is modelled as a Synchronous DataFlow graph (SDF). An efficient implemen-tation of the algorithm onto a multi-core DSP, the TI C6487, is then explained. Benchmarks for the solution are given

    Pre- and post-scheduling memory allocation strategies on MPSoCs

    Get PDF
    6 pagesInternational audienceThis paper introduces and assesses a new method to allocate memory for applications implemented on a shared memory Multiprocessor System-on-Chip (MPSoC). This method first consists of deriving, from a Synchronous Dataflow (SDF) algorithm description, a Memory Exclusion Graph (MEG) that models all the memory objects of the application and their allocation constraints. Based on the MEG, memory allocation can be performed at three different stages of the implementation process: prior to the scheduling process, after an untimed multicore schedule is decided, or after a timed multicore schedule is decided. Each of these three alternatives offers a distinct trade-off between the amount of allocated memory and the flexibility of the application multicore execution. Tested use cases are based on descriptions of real applications and a set of random SDF graphs generated with the SDF For Free (SDF3) tool. Experimental results compare several allocation heuristics at the three implementation stages. They show that allocating memory after an untimed schedule of the application has been decided offers a reduced memory footprint as well as a flexible multicore execution

    Memory Bounds for the Distributed Execution of a Hierarchical Synchronous Data-Flow Graph

    Get PDF
    International audienceThis paper presents an application analysis technique to define the boundary of shared memory requirements of Multiprocessor System-on-Chip (MPSoC) in early stages of development. This technique is part of a rapid prototyping process and is based on the analysis of a hierarchical Synchronous Data-Flow (SDF) graph description of the system application. The analysis does not require any knowledge of the system architecture, the mapping or the scheduling of the system application tasks. The initial step of the method consists of applying a set of transformations to the SDF graph so as to reveal its memory characteristics. These transformations produce a weighted graph that represents the different memory objects of the application as well as the memory allocation constraints due to their relationships. The memory boundaries are then derived from this weighted graph using analogous graph theory problems, in particular the Maximum-Weight Clique (MWC) problem. Stateof-the-art algorithms to solve these problems are presented and a heuristic approach is proposed to provide a near-optimal solution of the MWC problem. A performance evaluation of the heuristic approach is presented, and is based on hierarchical SDF graphs of realistic applications. This evaluation shows the efficiency of proposed heuristic approach in finding near optimal solutions

    Memory Bounds for the Distributed Execution of a Hierarchical Synchronous Data-Flow Graph

    Get PDF
    International audienceThis paper presents an application analysis technique to define the boundary of shared memory requirements of Multiprocessor System-on-Chip (MPSoC) in early stages of development. This technique is part of a rapid prototyping process and is based on the analysis of a hierarchical Synchronous Data-Flow (SDF) graph description of the system application. The analysis does not require any knowledge of the system architecture, the mapping or the scheduling of the system application tasks. The initial step of the method consists of applying a set of transformations to the SDF graph so as to reveal its memory characteristics. These transformations produce a weighted graph that represents the different memory objects of the application as well as the memory allocation constraints due to their relationships. The memory boundaries are then derived from this weighted graph using analogous graph theory problems, in particular the Maximum-Weight Clique (MWC) problem. Stateof-the-art algorithms to solve these problems are presented and a heuristic approach is proposed to provide a near-optimal solution of the MWC problem. A performance evaluation of the heuristic approach is presented, and is based on hierarchical SDF graphs of realistic applications. This evaluation shows the efficiency of proposed heuristic approach in finding near optimal solutions

    Building a RTOS for MPSoC Dataflow Programming

    No full text
    International audienceMultiprocessor Systems-on-Chip (MPSoC) are becoming the standard high performance Digital Signal Processing (DSP) systems. Hardware complexity abstraction is needed to enable efficient MPSoC programming. A major challenge of MPSoC programming is efficiently handling the combination of new features necessary in a MPSoC operating system: load balancing and efficient use of the parallel resources, with the more traditional features of Real-Time Operating Systems (RTOS): resource sharing between applications, task priorities and reactivity to events. This paper presents a method to combine dataflow methods and RTOS features. The resulting system prototypes an RTOS for symmetric multiprocessing MPSoCs whose inputs are dataflow graphs of applications. The prototype is built on the uC/OS-II RTOS. Experimental results are given on a 3GPP Long Term Evolution algorithm executed on a 4-core MPSoC

    PREESM: A Dataflow-Based Rapid Prototyping Framework for Simplifying Multicore DSP Programming

    Get PDF
    International audienceThe high performance Digital Signal Processors (DSP) currently manufactured by Texas Instruments are heterogeneous multiprocessor architectures. Programming these architectures is a complex task often reserved to specialized engineers because the bottlenecks of both the algorithm and the architecture need to be deeply understood in order to obtain a fairly parallel execution. The PREESM framework objective is to simplify the programming of multicore DSP systems by building on dataflow programming methods. The current functionalities of this scalable framework cover memory and time analysis, as well as automatic deadlock-free code generation. Several tutorials are provided with the tool for fast initiation of C programmers to multicore DSP programming. This paper demonstrates PREESM capabilities by comparing simulation and execution performances on a stereo matching algorithm prototyped on the TMS320C6678 8-core DSP device

    Physical Layer Multi-Core Prototyping: A Dataflow-Based Approach for LTE eNodeB

    No full text
    International audienceBase stations developed according to the 3GPP Long Term Evolution (LTE) standard require unprecedented processing power. 3GPP LTE enables data rates beyond hundreds of Mbits/s by using advanced technologies, necessitating a highly complex LTE physical layer. The operating power of base stations is a significant cost for operators, and is currently optimized using state-of-the-art hardware solutions, such as heterogeneous distributed systems. The traditional system design method of porting algorithms to heterogeneous distributed systems based on test-and-refine methods is a manual, thus time-expensive, task. Physical Layer Multi-Core Prototyping: A Dataflow-Based Approach provides a clear introduction to the 3GPP LTE physical layer and to dataflow-based prototyping and programming. The difficulties in the process of 3GPP LTE physical layer porting are outlined, with particular focus on automatic partitioning and scheduling, load balancing and computation latency reduction, specifically in systems based on heterogeneous multi-core Digital Signal Processors. Multi-core prototyping methods based on algorithm dataflow modeling and architecture system-level modeling are assessed with the goal of automating and optimizing algorithm porting. With its analysis of physical layer processing and proposals of parallel programming methods, which include automatic partitioning and scheduling, Physical Layer Multi-Core Prototyping: A Dataflow-Based Approach is a key resource for researchers and students. This study of LTE algorithms which require dynamic or static assignment and dynamic or static scheduling, allows readers to reassess and expand their knowledge of this vital component of LTE base station design

    SPIDER: A Synchronous Parameterized and Interfaced Dataflow-Based RTOS for Multicore DSPs

    Get PDF
    International audienceThis paper introduces a novel Real-Time Operating System (RTOS) based on a parameterized dataflow Model of Computation (MoC). This RTOS, called Synchronous Parameterized and Interfaced Dataflow Embedded Runtime (SPiDER), aims at efficiently scheduling Parameterized and Interfaced Synchronous Dataflow (PiSDF) graphs on multicore architectures. It exploits features of PiSDF to locate locally static regions that exhibit predictable application behavior. This paper uses a multicore signal processing benchmark to demonstrate that the SPiDER runtime can exploit more parallelism than a conventional multicore task scheduler. By comparing experimental results of the SPiDER runtime on an 8-core Texas Instruments Keystone I Digital Signal Processor (DSP) with those obtained from the OpenMP framework, latency improvements of up to 26% are demonstrated

    Applying the Adaptive Hybrid Flow-Shop Scheduling Method to Schedule a 3GPP LTE Physical Layer Algorithm onto Many-Core Digital Signal Processors

    Get PDF
    International audienceCurrently, Multicore Digital Signal Processor (DSP) platforms are commonly used in telecommunications baseband processing. In the next few years, high performance DSPs are likely to combine many more DSP cores for signal processing with some General-Purpose Processor (GPP) cores for application control. As the number of cores increases in new DSP platform designs, scheduling of applications is becoming a complex operation. Meanwhile, the variability of the scheduled applications also tends to increase as applications become more sophisticated. Such variations require runtime adaptivity of application scheduling. This paper extends the previous work on adaptive scheduling by using the Hybrid Flow-Shop (HFS) scheduling method, which enables the device architecture to be modeled as a pipeline of Processing Elements (PEs) with multiple alternate PEs for each pipeline stage. HFS scheduling is applied to the scheduling of 3rd Generation Partnership Project (3GPP) Long Term Evolution (LTE) telecommunication standard Uplink Physical Layer data processing (PUSCH). The experiments, conducted on an ARM Cortex-A9 GPP, show that an HFS scheduling algorithm has an overhead that increases very slowly with the number of PEs. This makes the method suitable for executing the adaptive scheduling in less than 1 ms for the 501 actors of a LTE PUSCH dataflow description executed on a 256-core architecture
    corecore